Search CORE

281 research outputs found

Metaphorical Expressions in Automatic Arabic Sentiment Analysis

Author: Alsiyat Israa
Piao Scott
Publication venue: European Language Resources Association (ELRA)
Publication date: 11/05/2020
Field of study

Over the recent years, Arabic language resources and NLP tools have been under rapid development. One of the important tasks for Arabic natural language processing is the sentiment analysis. While a significant improvement has been achieved in this research area, the existing computational models and tools still suffer from the lack of capability of dealing with Arabic metaphorical expressions. Metaphors have an important role in Arabic language due to its unique history and culture. Metaphors provide a linguistic mechanism for expressing ideas and notions that can be different from their surface form. Therefore, in order to efficiently identify true sentiment of Arabic language data, a computational model needs to be able to “read between lines”. In this paper, we examine the issue of metaphors in automatic Arabic sentiment analysis by carrying out an experiment, in which we observe the performance of a state-of-art Arabic sentiment tool on metaphors and analyse the result to gain a deeper insight into the issue. Our experiment evidently shows that metaphors have a significant impact on the performance of current Arabic sentiment tools, and hence it is an important task to develop Arabic language resources and computational models for Arabic metaphors

Lancaster E-Prints

Towards a semantic tagger for analysing contents of Chinese corporate reports

Author: Hu Xiaopeng
Piao Scott Songlin
Rayson Paul Edward
Publication venue
Publication date: 01/01/2015
Field of study

Lancaster E-Prints

Creating and validating multilingual semantic representations for six languages:expert versus non-expert crowds

Author: El-Haj Mahmoud
Rayson Paul
Piao Scott
Wattam Stephen
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2017
Field of study

Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a common multilingual semantic taxonomy for sets of words in six languages. 451 non-experts (including 427 Mechanical Turk workers) and 15 expert participants semantically annotated 250 words manually for Arabic, Chinese, English, Italian, Portuguese and Urdu lexicons. In order to avoid erroneous (spam) crowdsourced results, we used a novel taskspecific two-phase filtering process where users were asked to identify synonyms in the target language, and remove erroneous senses

Crossref

Biblioteca Digital de la Comunidad de Madrid

Lancaster E-Prints

Creating and validating multilingual semantic representations for six languages:expert versus non-expert crowds

Author: El-Haj Mahmoud
Piao Scott
Rayson Paul
Wattam Stephen
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

Crossref

Lancaster E-Prints

Development of the multilingual semantic annotation system

Author: Bianchi Francesca
D'egidio Angela
Dayrell Carmen
Piao Scott
Rayson Paul
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/06/2015
Field of study

This paper reports on our research to generate multilingual semantic lexical resources and develop multilingual semantic annotation software, which assigns each word in running text to a semantic category based on a lexical semantic classification scheme. Such tools have an important role in developing intelligent multilingual NLP, text mining and ICT systems. In this work, we aim to extend an existing English semantic annotation tool to cover a range of languages, namely Italian, Chinese and Brazilian Portuguese, by bootstrapping new semantic lexical resources via automatically translating existing English semantic lexicons into these languages. We used a set of bilingual dictionaries and word lists for this purpose. In our experiment, with minor manual improvement of the automatically generated semantic lexicons, the prototype tools based on the new lexicons achieved an average lexical coverage of 79.86% and an average annotation precision of 71.42% (if only precise annotations are considered) or 84.64% (if partially correct annotations are included) on the three languages. Our experiment demonstrates that it is feasible to rapidly develop prototype semantic annotation tools for new languages by automatically bootstrapping new semantic lexicons based on existing ones

Lancaster E-Prints

Archivio Istituzionale della Ricerca- Università del Salento

Feasibility of Emotions as Features for Suicide Ideation Detection in Social Media

Author: Arreerard Ratchakrit
Piao Scott
Publication venue
Publication date: 16/06/2023
Field of study

Suicide-related social media message detection is an important issue. Such messages can reveal a warning sign of suicidal behaviour. This paper examines the efficacy of using emotions as sole features to detect suicide-related messages. We investigated two methods which use a single emotion and a set of seven emotions as features respectively. For emotion classification, we used a classifier based on BERT named "Emotion English DistilRoBERTa-base". For detecting suicide-related messages, we tested Naive Bayes and Support Vector Machine. As our training/test data for suicide message detection, we used a publicly available dataset collected from Reddit in which each post is labelled as "suicide" or "non-suicide". Our method obtained accuracies of 76.2% and 76.8% for detecting suicide-related messages with Naive Bayes and Support Vector Machine respectively. Our experiment also shows that three emotion categories, "anger", "fear" and "sadness", have a strongest correlation with suicide-related messages

Lancaster E-Prints

Building a Spanish lexicon for corpus analysis

Author: Jiménez Ricardo-María
Piao Scott Songlin
Rayson Paul Edward
Sanjurjo-González Hugo
Publication venue: 'Universidad de Jaen'
Publication date: 04/05/2017
Field of study

This paper seeks to describe the creation of a Spanish lexicon with semantic annotation in order to analyse more extensive corpora in the Spanish language. The semantic resources most employed nowadays are WordNet, FrameNet, PDEV and USAS, but they have been used mainly for English language research. The creation of a large Spanish lexicon will permit a greater amount of studies of corpora in Spanish can be undertaken. In the description of the steps followed for the construction of the lexicon, the difficulties encountered in its creation, and the solutions used to overcome them will be described. Finally, the construction of the lexicon will allow specific research tasks to be carried out, such as metaphor analysis, ACD studies and even PLN studies

Lancaster E-Prints

Building a Spanish lexicon for corpus analysis

Author: Jiménez Ricardo-María
Sanjurjo-González Hugo
Rayson Paul Edward
Piao Scott Songlin
Publication venue: 'Universidad de Jaen'
Publication date: 04/05/2017
Field of study

Biblioteca Digital de la Comunidad de Madrid

Lancaster E-Prints

A Crowdsource-Annotated Tourism Review Corpus of Emotion

Author: Almansour Mansour
Piao Scott
Rayson Paul
Publication venue
Publication date: 06/07/2023
Field of study

Lancaster E-Prints

Contrastive Training with More Data

Author: Mander Stephen
Piao Scott
Rahmani Hossein
Publication venue
Publication date: 10/04/2023
Field of study

This paper proposes a new method of contrastive training over multiple data points, focusing on the scaling issue present when using in-batch negatives. Our approach compares transformer training with dual encoders versus training with multiple encoders. Our method can provide a feasible approach to improve loss modelling as encoders scale

Lancaster E-Prints